Lamp - Tr - 065 Car - Tr - 962 Cs - Tr - 4218 4400019848
نویسنده
چکیده
Document images undergo various degradation processes. Numerous models of these degradation processes have been proposed in the literature. In this paper we propose a model-based restoration algorithm. The restoration algorithm rst estimates the parameters of a degradation model and then uses the estimated parameters to construct a lookup table for restoring the degraded image. The estimated degradation model is used to estimate the probability of an ideal binary pattern, given the noisy observed pattern. This probability is estimated by degrading noise-free document images and then computing the frequency of corresponding noise-free and noisy pattern pairs. This conditional probability is then used to construct a lookup table to restore the noisy images. The impact of the restoration process is then quantiied by computing the decrease in OCR word and character error rate. We nd that given the estimated degradation model parameter values, the restoration algorithm decreases the character error rate by 16.1% and the word error rate by 7.35%. In some categories of degradation (e.g. model parameters that give rise to broken characters) there is a 41.5% reduction in character error rate and a 20.4% reduction in word error rate. Abstract Document images undergo various degradation processes. Numerous models of these degradation processes have been proposed in the literature. In this paper we propose a model-based restoration algorithm. The restoration algorithm rst estimates the parameters of a degradation model and then uses the estimated parameters to construct a lookup table for restoring the degraded image. The estimated degradation model is used to estimate the probability of an ideal binary pattern, given the noisy observed pattern. This probability is estimated by degrading noise-free document images and then computing the frequency of corresponding noise-free and noisy pattern pairs. This conditional probability is then used to construct a lookup table to restore the noisy images. The impact of the restoration process is then quantiied by computing the decrease in OCR word and character error rate. We nd that given the estimated degradation model parameter values, the restoration algorithm decreases the character error rate by 16.1% and the word error rate by 7.35%. In some categories of degradation (e.g. model parameters that give rise to broken characters) there is a 41.5% reduction in character error rate and a 20.4% reduction in word error rate.
منابع مشابه
Morphological Degradation Models and their Use in Document Image Restoration Qigong Zheng and Tapas Kanungo Morphological Degradation Models and their Use in Document Image Restoration
Document images undergo various degradation processes. Numerous models of these degradation processes have been proposed in the literature. In this paper we propose a model-based restoration algorithm. The restoration algorithm rst estimates the parameters of a degradation model and then uses the estimated parameters to construct a lookup table for restoring the degraded image. The estimated de...
متن کاملLAMP - TR - 145 CS - TR - 4877 UMIACS - TR - 2007 - 36 HCIL - 2007 - 10 July 2007 Exploring the Effectiveness of Related Article Search in PubMed
We describe two complementary studies that explore the effectiveness of related article search in PubMed. The first attempts to characterize the topological properties of document networks that are implicitly defined by this capability. The second focuses on analysis of PubMed query logs to gain an understanding of real user behavior. Combined evidence suggests that related article search is bo...
متن کاملCAR - TR - 854 N 00014 - 96 - 1 - 0521 CS - TR - 3780 March 1997
Many types of common objects, such as tools and vehicles, usually move in simple ways when they are wielded or driven: The natural axes of the object tend to remain aligned with the local trihedron defined by the object's trajectory. Based on this observation we use a model called Frenet-Serret motion which corresponds to the motion of a moving trihedron along a space curve. Knowing how the Fre...
متن کاملLAMP - TR - 119 CS - TR - 4695 UMIACS - TR - 2005 - 04 February 2005 Automatically Evaluating Answers to Definition Questions
Following recent developments in the automatic evaluation of machine translation and document summarization, we present a similar approach, implemented in a measure called Pourpre, for automatically evaluating answers to definition questions. Until now, the only way to assess the correctness of answers to such questions involves manual determination of whether an information nugget appears in a...
متن کاملCAR - TR - 673 April 1993 CS - TR - 3078 ISR - 93 - 52 AlphaSlider : Searching Textual Lists with Sliders
AlphaSlider is a query interface that uses a direct manipulation slider to select words, phrases, or names from an existing list. This paper introduces a prototype of AlphaSlider, describes the design issues, reports on an experimental evaluation, and offers directions for further research. The experiment tested 24 subjects selecting items from lists of 40, 80, 160, and 320 entries. Mean select...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001